feature 0
- North America > United States > Rhode Island > Providence County > Providence (0.04)
- North America > United States > Louisiana > Orleans Parish > New Orleans (0.04)
- North America > United States > California > San Diego County > San Diego (0.04)
- (4 more...)
SHAP-Based Supervised Clustering for Sample Classification and the Generalized Waterfall Plot
In this growing age of data and technology, large black-box models are becoming the norm due to their ability to handle vast amounts of data and learn incredibly complex input-output relationships. The deficiency of these methods, however, is their inability to explain the prediction process, making them untrustworthy and their use precarious in high-stakes situations. SHapley Additive exPlanations (SHAP) analysis is an explainable AI method growing in popularity for its ability to explain model predictions in terms of the original features. For each sample and feature in the data set, we associate a SHAP value that quantifies the contribution of that feature to the prediction of that sample. Clustering these SHAP values can provide insight into the data by grouping samples that not only received the same prediction, but received the same prediction for similar reasons. In doing so, we map the various pathways through which distinct samples arrive at the same prediction. To showcase this methodology, we present a simulated experiment in addition to a case study in Alzheimer's disease using data from the Alzheimer's Disease Neuroimaging Initiative (ADNI) database. We also present a novel generalization of the waterfall plot for multi-classification.
- North America > United States > California (0.28)
- North America > United States > Indiana (0.04)
- North America > United States > Nebraska (0.04)
- North America > Canada (0.04)
A The Estimator null A X W)
A.2 Proof of Theorem 1 To prove Theorem 1, we assume that G Proof of Lemma 1. Let's first rewrite Equation (4) as null null By Lemma 1, linearity of expectation and knowing that each RWT is independent from the other tours by the Strong Markov Property, Theorem 1 holds. MHM-GNN can recover edge-based models where representations don't use graph-wide However, on Rent the Runway we see the raw features achieving the highest performance. That is, structural information does not seem to be relevant to this specific task. All hyperparameters were chosen to minimize training loss. For k = 5, we used a minibatch of size 5 in all datasets.
- North America > United States > Rhode Island > Providence County > Providence (0.04)
- North America > United States > Louisiana > Orleans Parish > New Orleans (0.04)
- North America > United States > California > San Diego County > San Diego (0.04)
- (4 more...)
Application and Evaluation of Large Language Models for Forecasting the Impact of Traffic Incidents
Jagadeesh, George, Iyer, Srikrishna, Polanowski, Michal, Thia, Kai Xin
This study examines the feasibility of applying large language models (LLMs) for forecasting the impact of traffic incident s on the traffic flow. The use of LLMs for this task has several advantages over existing machine learning - based solutions such as not requiring a large training dataset and the ability to utilize free - text incident logs . We propose a fully LLM - based solution that predicts the incident impact using a combination of traffic features and LLM - extracted incident features. A key ingredient of this solution is an effective method of select ing examples for the LLM's in - context learning. We evaluate the performance of three advanced LLMs and two state - of - the - art machine learning models on a real traffic incident dataset . The results show that the best - performing LLM matches the accuracy of the most accurate machine learning model, despite the former not having been trained on this prediction task. The findings indicate that LLMs are a practically viable option for traffic incident impact prediction.
- Pacific Ocean > North Pacific Ocean > San Francisco Bay (0.04)
- Oceania > Australia (0.04)
- North America > United States > California > San Francisco County > San Francisco (0.04)
- Asia > Singapore (0.04)
- Transportation > Ground > Road (0.46)
- Consumer Products & Services > Travel (0.34)
Applying Large Language Models to Issue Classification: Revisiting with Extended Data and New Models
Aracena, Gabriel, Luster, Kyle, Santos, Fabio, Steinmacher, Igor, Gerosa, Marco A.
Effective prioritization of issue reports in software engineering helps to optimize resource allocation and information recovery. However, manual issue classification is laborious and lacks scalability. As an alternative, many open source software (OSS) projects employ automated processes for this task, yet this method often relies on large datasets for adequate training. Traditionally, machine learning techniques have been used for issue classification. More recently, large language models (LLMs) have emerged as powerful tools for addressing a range of software engineering challenges, including code and test generation, mapping new requirements to legacy software endpoints, and conducting code reviews. The following research investigates an automated approach to issue classification based on LLMs. By leveraging the capabilities of such models, we aim to develop a robust system for prioritizing issue reports, mitigating the necessity for extensive training data while also maintaining reliability in classification. In our research, we developed an LLM-based approach for accurately labeling issues by selecting two of the most prominent large language models. We then compared their performance across multiple datasets. Our findings show that GPT-4o achieved the best results in classifying issues from the NLBSE 2024 competition. Moreover, GPT-4o outperformed DeepSeek R1, achieving an F1 score 20% higher when both models were trained on the same dataset from the NLBSE 2023 competition, which was ten times larger than the NLBSE 2024 dataset. The fine-tuned GPT-4o model attained an average F1 score of 80.7%, while the fine-tuned DeepSeek R1 model achieved 59.33%. Increasing the dataset size did not improve the F1 score, reducing the dependence on massive datasets for building an efficient solution to issue classification.
- North America > United States > Arizona > Maricopa County > Phoenix (0.04)
- North America > United States > Arizona > Coconino County > Flagstaff (0.04)
- South America > Chile > Santiago Metropolitan Region > Santiago Province > Santiago (0.04)
- (7 more...)
Sparse Autoencoder Features for Classifications and Transferability
Gallifant, Jack, Chen, Shan, Sasse, Kuleen, Aerts, Hugo, Hartvigsen, Thomas, Bitterman, Danielle S.
Sparse Autoencoders (SAEs) provide potentials for uncovering structured, human-interpretable representations in Large Language Models (LLMs), making them a crucial tool for transparent and controllable AI systems. We systematically analyze SAE for interpretable feature extraction from LLMs in safety-critical classification tasks. Our framework evaluates (1) model-layer selection and scaling properties, (2) SAE architectural configurations, including width and pooling strategies, and (3) the effect of binarizing continuous SAE activations. SAE-derived features achieve macro F1 > 0.8, outperforming hidden-state and BoW baselines while demonstrating cross-model transfer from Gemma 2 2B to 9B-IT models. These features generalize in a zero-shot manner to cross-lingual toxicity detection and visual classification tasks. Our analysis highlights the significant impact of pooling strategies and binarization thresholds, showing that binarization offers an efficient alternative to traditional feature selection while maintaining or improving performance. These findings establish new best practices for SAE-based interpretability and enable scalable, transparent deployment of LLMs in real-world applications. Full repo: https://github.com/shan23chen/MOSAIC.
- North America > United States > Virginia (0.04)
- Europe > Netherlands > Limburg > Maastricht (0.04)
- Asia > Middle East > Jordan (0.04)
SPOCK 2.0: Update to the FeatureClassifier in the Stability of Planetary Orbital Configurations Klassifier
Thadhani, Elio, Ba, Yolanda, Rein, Hanno, Tamayo, Daniel
ABSTRACT The Stability of Planetary Orbital Configurations Klassifier (SPOCK) package collects machine learning models for predicting the stability and collisional evolution of compact planetary systems. In this paper we explore improvements to SPOCK's binary stability classifier (FeatureClassifier), which predicts orbital stability by collecting data over a short N-body integration of a system. We additionally discovered that 10% of N-body integrations in SPOCK's original training dataset were duplicated by accident, and that < 1% were misclassified as stable when they in fact led to ejections. We provide a cleaned dataset of 100,000+ unique integrations, release a newly trained stability classification model, and make minor updates to the API. INTRODUCTION clude systems that go unstable during the short integration phase; which slightly reduces the model AUC Determining orbital stability over planetary systems' from 0.9527 to 0.9490 (an AUC of 1 would be a perfect typical lifetimes of several Gyr through direct numerical model).
- North America > Canada > Ontario > Toronto (0.06)
- North America > United States > New York > New York County > New York City (0.05)
- North America > United States > California > Los Angeles County > Claremont (0.05)
- Europe > United Kingdom > England > Cambridgeshire > Cambridge (0.05)
Case Study: Leveraging GenAI to Build AI-based Surrogates and Regressors for Modeling Radio Frequency Heating in Fusion Energy Science
Bethel, E. Wes, Cramer, Vianna, del Rio, Alexander, Narins, Lothar, Pestano, Chris, Verma, Satvik, Arias, Erick, Bertelli, Nicola, Perciano, Talita, Shiraiwa, Syun'ichi, Villar, Álvaro Sánchez, Wallace, Greg, Wright, John C.
This work presents a detailed case study on using Generative AI (GenAI) to develop AI surrogates for simulation models in fusion energy research. The scope includes the methodology, implementation, and results of using GenAI to assist in model development and optimization, comparing these results with previous manually developed models.
- North America > United States > California > San Francisco County > San Francisco (0.14)
- North America > United States > Massachusetts > Middlesex County > Cambridge (0.04)
- North America > United States > New York > New York County > New York City (0.04)
- (2 more...)
From Words to Numbers: Your Large Language Model Is Secretly A Capable Regressor When Given In-Context Examples
Vacareanu, Robert, Negru, Vlad-Andrei, Suciu, Vasile, Surdeanu, Mihai
We analyze how well pre-trained large language models (e.g., Llama2, GPT-4, Claude 3, etc) can do linear and non-linear regression when given in-context examples, without any additional training or gradient updates. Our findings reveal that several large language models (e.g., GPT-4, Claude 3) are able to perform regression tasks with a performance rivaling (or even outperforming) that of traditional supervised methods such as Random Forest, Bagging, or Gradient Boosting. For example, on the challenging Friedman #2 regression dataset, Claude 3 outperforms many supervised methods such as AdaBoost, SVM, Random Forest, KNN, or Gradient Boosting. We then investigate how well the performance of large language models scales with the number of in-context exemplars. We borrow from the notion of regret from online learning and empirically show that LLMs are capable of obtaining a sub-linear regret.
- North America > United States > Arizona (0.04)
- North America > Canada > Ontario > Toronto (0.04)
- Europe > Romania > Nord-Vest Development Region > Cluj County > Cluj-Napoca (0.04)
- (5 more...)
- Research Report > New Finding (1.00)
- Research Report > Experimental Study (0.93)
Time Series Diffusion in the Frequency Domain
Crabbé, Jonathan, Huynh, Nicolas, Stanczuk, Jan, van der Schaar, Mihaela
Fourier analysis has been an instrumental tool in the development of signal processing. This leads us to wonder whether this framework could similarly benefit generative modelling. In this paper, we explore this question through the scope of time series diffusion models. More specifically, we analyze whether representing time series in the frequency domain is a useful inductive bias for score-based diffusion models. By starting from the canonical SDE formulation of diffusion in the time domain, we show that a dual diffusion process occurs in the frequency domain with an important nuance: Brownian motions are replaced by what we call mirrored Brownian motions, characterized by mirror symmetries among their components. Building on this insight, we show how to adapt the denoising score matching approach to implement diffusion models in the frequency domain. This results in frequency diffusion models, which we compare to canonical time diffusion models. Our empirical evaluation on real-world datasets, covering various domains like healthcare and finance, shows that frequency diffusion models better capture the training distribution than time diffusion models. We explain this observation by showing that time series from these datasets tend to be more localized in the frequency domain than in the time domain, which makes them easier to model in the former case. All our observations point towards impactful synergies between Fourier analysis and diffusion models.
- North America > United States (0.15)
- Europe > United Kingdom > England > Cambridgeshire > Cambridge (0.14)
- Europe > Germany (0.14)
- Health & Medicine (0.87)
- Energy > Oil & Gas > Upstream (0.71)